Gene Ontology Similarity Measures Based on Linear Order Statistics

نویسندگان

  • James M. Keller
  • James C. Bezdek
  • Mihail Popescu
  • Nikhil R. Pal
  • Joyce A. Mitchell
  • Jacalyn M. Huband
چکیده

The standard method for comparing gene products (proteins or RNA) is to compare their DNA or amino acid sequences. Additional information about some gene products may come from multiple sources, including the set of Gene Ontology (GO) annotations and the set of journal abstracts related to each gene product. Gene product similarity measures can be based on evaluating sets of descriptor terms found in the GO taxonomy, and/or the index term sets of the related documents (MeSH annotations). While our techniques can be applied to term sets from any taxonomy, we restrict our examples in this article to GO annotations. We investigate the use of linear order statistics (LOS) to build similarity relations on pairs of terms that are used in the GO as linguistic descriptors of genes and gene products. One of our objectives is to investigate the construction and utility of visual 2 Keller et al. assessments of relational data (in this case, dissimilarity matrices) for discovering tendencies of groups of gene products to "cluster together". We use gene product data derived from a group of 194 gene products representing three protein families extracted from ENSEMBL. Our examples suggest that LOS similarity measures are more effective than traditional sequence-based similarity measures at capturing relationships between pairs of gene products in ENSEMBL families when annotation information is available. We show examples of how these similarity measures can assist in knowledge discovery and gene product family validation.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Topology-Based Metric for Measuring Term Similarity in the Gene Ontology

The wide coverage and biological relevance of the Gene Ontology (GO), confirmed through its successful use in protein function prediction, have led to the growth in its popularity. In order to exploit the extent of biological knowledge that GO offers in describing genes or groups of genes, there is a need for an efficient, scalable similarity measure for GO terms and GO-annotated proteins. Whil...

متن کامل

Information Content-Based Gene Ontology Functional Similarity Measures: Which One to Use for a Given Biological Data Type?

The current increase in Gene Ontology (GO) annotations of proteins in the existing genome databases and their use in different analyses have fostered the improvement of several biomedical and biological applications. To integrate this functional data into different analyses, several protein functional similarity measures based on GO term information content (IC) have been proposed and evaluated...

متن کامل

Information Content-Based Gene Ontology Semantic Similarity Approaches: Toward a Unified Framework Theory

Several approaches have been proposed for computing term information content (IC) and semantic similarity scores within the gene ontology (GO) directed acyclic graph (DAG). These approaches contributed to improving protein analyses at the functional level. Considering the recent proliferation of these approaches, a unified theory in a well-defined mathematical framework is necessary in order to...

متن کامل

An ontological hybrid recommender system for dealing with cold start problem

Recommender Systems ( ) are expected to suggest the accurate goods to the consumers. Cold start is the most important challenge for RSs. Recent hybrid s combine  and . We introduce an ontological hybrid RS where the ontology has been employed in its  part while improving the ontology structure by its  part. In this paper, a new hybrid approach is proposed based on the combination of demog...

متن کامل

Methods of Normalization the Results of Gene Ontology Term Similarity

The article addresses the issue of improvement of the results quality when Gene Ontology (GO) term similarity is calculated. Several GO similarity measures produce results out of the range [0; 1]. Whereas, in order to compare different similarity measures or apply further processing, it is needed to normalise the results to this range. The most popular and well-known method of normalization is ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems

دوره 14  شماره 

صفحات  -

تاریخ انتشار 2006